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Abstract 

The goal of a denoising algorithm is to reconstruct a signal from its noise-corrupted observations. Perfect 
reconstruction is seldom possible and performance is measured under a given fidelity criterion. In a recent work, 
the authors addressed the problem of denoising unknown discrete signals corrupted by a discrete memoryless channel 
when the channel, rather than being completely known, is only known to lie in some uncertainty set of possible 
channels. A sequence of denoisers was derived for this case and shown to be asymptotically optimal with respect 
(3JT)' to a worst-case criterion argued most relevant to this setting. In the present work we address the implementation 

^ ( and complexity of this denoiser for channels parametrized by a scalar, establishing its practicality. We show that for 

, symmetric channels, the problem can be mapped into a convex optimization problem, which can be solved efficiently. 

We also present empirical results suggesting the potential of these schemes to do well in practice. A key component 
of our schemes is an estimator of the subset of channels in the uncertainty set that are feasible in the sense of 
being able to give rise to the noise-corrupted signal statistics for some channel input distribution. We establish the 
efficiency of this estimator, both algorithmically and experimentally. We also present a modification of the recently 
developed discrete universal denoiser (DUDE) that assumes a channel based on the said estimator, and show that, 
in practice, the resulting scheme performs well. For concreteness, we focus on the binary alphabet case and binary 
symmetric channels, but also discuss the extensions of the algorithms to general finite alphabets and to general 
C/3 , channels parameterized by a scalar. 

Index Terms 

Binary Images, Channel Uncertainty, Convex optimization, Denoising algorithms, Discrete universal denoising, 
DUDE, Image denoising, Minimax schemes. 
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O ' I. Introduction 

00 . 

In [10] it was shown that optimum denoising of a finite-alphabet process corrupted by a discrete memoryless 

in 

channel (DMC) whose associated transition matrix is invertible can be achieved, asymptotically, without knowledge 
of the source statistics, provided the channel is known. In [4] the problem where, in addition to the lack of knowledge 
of the source statistics, there is also uncertainty in the channel characteristics was addressed. Motivation for the 
setting of channel uncertainty is also discussed in [4]. The main focus of this paper is on algorithms for implementing 
the denoising schemes suggested in [4] for channels that can be parametrized by a scalar. An example of such a 
channel is a binary symmetric channel (BSC), parametrized by the crossover probability, which uniquely defines 
the channel. We shall focus for concreteness on the case of binary alphabets, since this is enough to capture the 
essence of the problem while minimizing cumbersome notation. We then demonstrate how the algorithms extend 
to the case of non-binary alphabets for channels parametrized by a scalar. We shall also make some observations 
regarding fundamental differences between the denoisers suggested in [4] and the Discrete Universal DEnoiser 
(DUDE) of [10]. In particular, we show that the suggested denoiser is not a special case of the DUDE, and that 
the DUDE will in general be suboptimal under the performance criteria of [4]. 
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In the setting of denoising for a known channel [10], there is a one to one correspondence between the channel 
output distribution and its input distribution. Our present setting is fundamentally different in that, given a noise- 
corrupted process, there may be many source-channel pairs that can give rise to it. As an example consider the 
binary symmetric channel. If the output process is a Bernoulli(a) process, there are uncountably many source- 
channel combinations which can produce the output distribution, e.g., both a Bernoulli(O) process going through 
a BSC with crossover probability a, and a Bernoulli(a) process going through a noise-free BSC (with crossover 
probability 0). This is a basic difference which renders the task of attaining the performance of the optimum non- 
universal Bayesian scheme impossible, even for a scheme with complete knowledge of the noise-corrupted signal 
statistics. This point is elaborated on in [4]. 

It has thus been argued in [4] that, under these circumstances, given any noise-corrupted source, a natural criterion 
under which the performance of a denoising scheme should be judged is its worst case performance under all source- 
channel pairs consistent with the given noise-corrupted source distribution. This clearly bounds the attainable worst 
case performance since, in our universal setting, even the noise-corrupted signal distribution is not a priori known. 

A family of denoisers that are universally optimal in the sense of asymptotically minimizing the said worst case 
performance are presented in [4]. The main contribution of the present work is in establishing the practicality of 
the denoisers of [4], by developing low complexity algorithms for implementing these denoising rules, in the case 
where the channel is parametrized by a scalar. 

When considering the binary alphabet case we shall assume the setting of a binary signal with an unknown 
distribution, corrupted by a BSC with a crossover probability S known to satisfy S < 1/2, but otherwise unknown. 
As will be detailed, our work in the binary alphabet setting conveys the essence of the implementation of the 
scheme for an arbitrary finite alphabet and an invertible channel matrix that depends on a single parameter, only 
known to lie in a given uncertainty set. 

After introducing some notation in Section |n] we turn in Section EH] to the binary minimax setting, where the 
goal is to minimize the maximum expected fraction of errors made by the denoiser over all source-channel pairs 
consistent with the noise-corrupted source distribution, where the channel also lies in the uncertainty set. In this 
section we present the denoiser suggested in [4], as well as one of its performance guarantees. We then present our 
algorithm for efficiently implementing the suggested denoiser. With careful analysis the problem can be mapped 
into three convex optimization problems. Each problem can be solved efficiently, and the denoiser achieving the 
minimum can be found. 

In Section HVl we discuss a key component of the algorithm, namely a method for estimating the set of "feasible" 
channels. For a given noise-corrupted source, we say that a channel is feasible if there exists a clean source such 
that when passed through the channel can give rise to the given noise-corrupted source. In particular, we present 
a low complexity algorithm to efficiently estimate the set of feasible channels, and compare its performance to a 
scheme suggested in [10] on binary symmetric channels. In Section[V]we explicitly describe the binary minimax 
denoiser via its pseudo-code. Furthermore, we analyze the complexity of the algorithm. In particular, we show that 
the suggested minimax denoiser has the same order of complexity as the DUDE of [10], namely linear in the size 
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of the data. 



We present experimental results in Section IVII We compare the performance of our algorithm, to estimate the 
set of feasible channels, to that of an algorithm suggested in [10]. We report the results of several simulations, 
comparing the performance of our denoiser with various versions of the DUDE of [10]. We show that the minimax 
denoiser performs well, in many cases outperforming the DUDE using various estimates of the channel parameter. 
However, when the estimate of the channel parameter is accurate, the DUDE using that estimate will perform 
close to optimally. In particular, it may do better than the minimax denoiser whose performance is inherently more 
conservative. The simulations include both one- and two-dimensionally indexed data. 

In Section IVHI we demonstrate ways to extend the algorithms of Sections [III] and IIVI which were developed for 
the binary case, to the more general case of non-binary alphabets and channels parameterized by a scalar, with 
naturally structured channel uncertainty sets. We briefly discuss the general case of multi-alphabet implementation 
for general channels that can be appropriately parametrized by a scalar, and describe a method for solving the 
optimization problem that arises. 

In Section IVIIII we show that the min-max in our problem setting is not equivalent to the max-min, implying 
the minimax denoiser is not, in general, a special form of the DUDE of [10]. We conclude in Section fTYl with a 
summary of our results. 



We assume that the components of the noise-free signal, the observation and the reconstruction signal take their 
values in the same finite alphabet A. Let C(A) denote the set of all invertible Discrete Memory less Channels 
(DMCs) 1 with input and output alphabet A. We identify an element II 6 C(A) with a x |_4| stochastic matrix, 
II(a, b) denoting the probability of a symbol b at the channel output when the channel input is a. Let S(A) denote 
the simplex of probability distributions on A. 2 Denote by A : A x A — > K a given loss function where A(x, x) 
is the loss incurred when estimating the symbol x with the symbol x. A randomized n-block denoiser X n is 
a mapping X n : A n — > (S(A)) n , i.e. upon seeing the noise-corrupted signal z n £ A n , the z-th reconstruction 
symbol is a 6 A with probability X^ (z n )[a\, where X^(z n ) 6 S(A) denotes the i-th component of X n (z n ) and 
Xj™j(z™)[a] denotes the probability that it assigns to a £ A. The construction of randomized denoisers is motivated 
by the minimax performance measure which is introduced in Section ITTT1 

Uppercase letters will denote random quantities while lower case letters denote deterministic values. Bold 
notation will indicate doubly infinite sequences e.g. X = (. . . , Xq, X\, . . .). Also, X\ denotes the sequence 
(Xi, . . . , Xj), omitting the subscript when i = 1. For x n £ A n and z n £ A n we denote 



i=l aeA 

In words L^ n (x n , z n ) is the expected normalized cumulative loss of the denoiser X n on the individual sequence 
pair (x n , z n ), where the expectation is with respect to the randomization of the denoiser. 

1 A channel is invertible if the inverse of its transition matrix exists. 

2 Similarly, we will use S k (A) to denote the simplex on fc-tuples on the alphabet A. Also, cS°°(.A) will denote the set of all distribution on 
doubly-infinite sequences that take value in A. 



II. Definitions and Notation 




(1) 
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With the exception of Section IVIII we consider processes with binary alphabets i.e. A = {0, 1}, corrupted 
by a BSC and take A(-, •) to be the Hamming loss function. In this case a n-block denoiser X n is a mapping 
X n : {0, 1}™ — > [0, 1]", S [0, 1] denoting the probability that the i-th reconstruction symbol is a 1 when 

the denoiser observes z n . For x n € {0, 1}" and z n € {0, 1}™ Q becomes 



1 " i 



»=i 



(2) 



Let Tk — {f A 2k+1 i— ► <S(-4)} be the set of all fc-order sliding window denoisers. A sliding window denoiser of 
order k works as follows: When denoising a particular symbol it considers the k symbols preceding it and the k 
symbols succeeding it. These k symbols before and after the current symbol, form a double-sided context of the 
current symbol. In particular, if we denote the current symbol by Zq, the two-sided context is zzl and z\. 

For a given denoiser / £ Tk, we use f(z_ k )[a] to denote the probability of reconstructing zq with the letter a. 
With this in mind, we define 

1 " 

L / ( 2 ;",z") = -^|^-/(zi+ fe fc )[l]|, (3) 
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namely, the expected normalized loss when employing the sliding-window denoiser defined by X^(z n ) = /(z*_ fe ). 

III. The MiniMax Criterion 

Our setting assumes uncertainty in the channel characteristics. We quantify this uncertainty by assuming that we 
are given a set of channels which includes the true channel that corrupted the clean source of interest. In the binary 
setting we assume the uncertainty set to consist of BSCs. The set is parameterized by the crossover probability 
and, for the sake of simplicity, we assume it is of the form [0,W], U < 1/2. 

For a stationary ergodic source P z let 

Coo(-Pz) = {0 < S < 1/2 : 3P X s.t. F x * 8 = P z } , (4) 

where Px * 5 denotes the output distribution of a BSC(<5) whose input process has distribution Px- Thus when 
the output process has distribution Pz the possible source-channel pairs can be described by the following set, 
{(Fx, 6) : Fx * 6 = P Z ,S E Coo(Pz)}. Further we let 

r(P z ) - maxCoolPz). (5) 

It is easy to see that Coo(Pz) = [0,r(P z )], since if S € Coo(Pz) then §' e C 00 (P Z ). for any 5' < S. Define 

A(P z )=min{W,r(P z )}. 

Hence, [0, A(P Z )] is the set of all channels in the uncertainty set that can give rise to P z when corrupting some 
noiseless source. Note that A only depends on P z through V and this dependence will often be omitted. 
For an n-block denoiser X n let 

£g(P Z) A,Z)= sup E Pxs [L^„(X",Z")|Z], (6) 

{(Px,<5):Px*<5=Pz,<5G[0,A]nQ} 
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where Ep xs [-|Z] denotes expectation conditioned on Z under the joint distribution on the noiseless and noise- 
corrupted source induced when the noiseless source ~ P x is corrupted by a BSC(<5). Thus £^ (P z , A, Z) is the 
worst case performance of the denoiser X n over all channels that can give rise to Pz, given the noisy realization 
Z. In the definition of (Pz, A, Z) we take the sup over the set of all channels in [0, A] intersected with the 
rational numbers, Q. This is to guarantee that the sup is over a countable set and thus that C? (Pz, A, Z) is 
a well-defined random variable. The rationale of using (|6jl as the performance measure in the unknown channel 
setting is discussed in [4]. 



Define now 



Further let 



a4 k) (P z , A, Z) = min Cf(P z ,A, Z), (7) 



Mfe(P Z ,A,Z)=Umsup4 n) 0Pz,A,Z). (8) 

n — >oo 

Therefore [if. is the performance of the best sliding window denoiser of length 2k + 1. Building on the definition 
of fik let the "sliding window minimum loss" be defined by 

M (Pz,A,Z) = lim Mfc (P z ,A,Z), (9) 

k — >oo 

where the limit exists since ^(Pz, A, Z) is clearly non-increasing with k. Since jik is the performance of the best 
sliding window denoiser of length 2k + 1, /i is the performance of the best finite length sliding window denoiser. 

A. Construction of a Universal Scheme 

For a e [0, 1], 6 < 1/2, d € [0, 1] and d x <E [0, 1] let 

F(a, S, d , di) = — — [(1 - <5 - a)(l - S)d + (1-5- a)5d 1 + {a- 6)6(1 - d ) + (a - 6)(1 - 6)(1 - di)l . 
1 — 26 

(10) 

In the single observation problem, for a € [5, 1 — <5], P(a, <5, do, c?i) is the expected loss of a denoising scheme 
which says 1 with probability d upon observing a zero and says 1 with probability di upon observing a one, when 
observing a Bernoulli((a — S)/(l — 26)) corrupted by a BSC(<5). Note that in such a case, the channel output is a 
Bernoulli(a). For a probability distribution on binary (2k + l)-tuples, P Z k , and / : {0, l} 2fe+1 — > [0, 1] we define 
the functional Gk by 

G k (P zk _ h ,5,f)= E F^i^^^J^JtlCl^.^l./rl.Mf]))?^^,^), (11) 
«Ifc,z?e{o,i} fc 

where P^ | z -i z f= denotes Pr(Zo = 1\ZZ\ = z Zl,Z± = z\) under the source Pz k k an d [ z ~k-i z o, z i] denotes the 
(2fe + l)-tuple. We further define J k by 

J fc (P zk A, /) = max G fe (p z , , <5, /) , (12) 

V - k / 5e[0,A] V - fc / 

and 

/MM fc \ Pz± h ) A) = arg mm J fe (p z t ^ , A, /) , (13) 
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selecting an arbitrary achiever when it is not unique. 

In our setting Pz is not known and thus neither is T(Pz) (recall (|5} for its definition), which is needed to find 
A, so we need to estimate it. If we knew the distribution of the /-tuple P z i (induced by Pz) we could evaluate the 
following upper bound on r(Pz): 

Ti (P z i) = max{0 < S < 1/2 : 3P x i s.t. P xl * S = P zl } . (14) 

Instead, we shall use the empirical distribution of a /-tuple induced by the observation of the noise-corrupted 
sequence. An efficient algorithm for calculating Ti(-) is presented in section HV1 This leads us to define 

A,(P 2 ,)=min{«,r,(Pjj.)}. (15) 

Now let Q 2k+1 [z n ] denote the (2k + l)-th order empirical distribution induced by z n and let X n ' k ' 1 denote the 
n-block denoiser defined by 

X^ k '\z n ) = /mm, (Q 2k+1 [z n ], Ai(z n )) k+l<i<n-k, (16) 

where, with slight abuse of notation we write Ai(z n ) as shorthand for A; I Q l [z n ] ). Ai(z n ) is our estimate of the 



set of feasible channels, i.e., the set of channel crossover probabilities belonging to the uncertainty set that can give 

■n,l 



rise to the channel output distribution. The denoiser ' (z n ) can be arbitrarily defined for i outside the range 



k + l<i<n — k. 

B. Performance Guarantee 

We shall now cite one of the results of [4] that provides a sound performance guarantee and therefore justification 
for the use of the schemes considered in this work. For a sequence {ipi} of non-negative reals let S^^(A) denote 
the set of stationary distributions whose i-th ^-mixing coefficient is upper bounded by ifii (cf., e.g., [2], [4] for the 
definition of ^-mixing coefficients). Most sources arising in practice, such as Markov sources with no restricted 
sequences and hidden Markov processes with no restricted state sequences can be shown to have exponentially 
diminishing ^-mixing coefficients [2]. 

Theorem 1 (Theorem 2 of [4]): Let {ipi} be a sequence of non-negative reals with ipi — > 0. There exists an 
unbounded sequences {/„} and {k n } such that if X" niv ~ Xn,k„,l n ^ where X n ' k ' 1 was defined in d!6i . then for 
any Pz S iS^}^), as n — > oo, the performance of X™ niv converges to that of the best finite length sliding 
window denoiser, //(Pz, A, Z). 

C. Efficient Computation of MiniMax Denoiser 

As is evident from dl6l . the "engine" at the heart of our denoisers is the calculation of fmwi k [Pz k k > A] (defined 
in dl3» . which we now consider. By observing that (II 01 can be simplified into a function that is quadratic in the 
maximizing argument, 5, and whose coefficients depend on the minimizing argument, /, (II Q can be written as, 

A(P zk , , f)8 2 - (1 + A(P zk _ , f))S + B(P zk _ , /) 



G fe P 
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where, 

B(P z *_ k J) = ^ (^ + /(c fc ,0) - ry Cfc /(c fe ,0) - rj c J{c k , 1)) P^(c fe ), (18b) 

and ?7 Cjc = (Zq = l\ck)- From now on we will suppress A and B's dependence on P Z k and /. It is easily 
shown that d!7l > is either convex or concave for fixed A and B. For the cases it is convex we need to solve, 



dG k [P zth ,S,f 



= 0. (19) 



88 

The solutions to fT3) are 5' = (A + V-A 2 - 2A + 4AB)/(2A) and 8" = (A - V~A 2 -2A + AAB)/{2A). 

Therefore to find the maximum in G3 , for both the convex and concave case, we have to consider no more than 

four values for 8, the endpoints of the interval [0, A], 8 = 8' , and 8 = 8" . Also note that 8' and 8" only have to 

be considered if they are feasible, i.e., if they lie in [0, A]. Thus d!2t is equivalent to 

AA 2 -AA-A + B AS' 2 - AS' - 8' + B AS" 2 - AS" - 8" + B 

' 1^2A ' 1^28' 1 5'e(o,A), y^W< ' 

where 1 denotes the indicator function. 

To minimize d20i we begin by observing that A and B are linear functions in the minimizing argument, /. 
Since the range of / is convex the observation implies that the range of (A, B) is also convex. Consider now the 
following modification of J20t 



f AA 2 -AA-A + B AS' 2 - AS' - 5' + B ) 
maX I 5 ' f^2A ' 1^28' Wa) j • (2D 

Define the sets 5(A) = {(A, B) : 5' {A, B) e (0, A)} and V(A) = {(A, B) : A < (2B — 1)/(1 - A)}. The set 

5(A) is the set of (A, B) corresponding to feasible values of 8' and the set V(A) is the region where 

AA 2 -AA-A + B 



B > 



1 - 2A 

We can then construct the sets 5i(A) = 5(A) c ny(A) and5 2 (A) = 5(A) c nV c (A). Note that {5(A), 5i(A), 5 2 (A)} 
is a partition of the range of (A, B) for all A 6 (0, 1/2). Furthermore, the constraints 



§ , = A + V-A 2 -2A + 4AB > 

2A ~ 

r/ A + V-A 2 -2A + 4AB ^ . 

8 = — < A, 

2A 

which define the set 5(A), are equivalent to the linear inequalities 

A < 

A < 2B - 1 

2B- 1 

A > . 

- 2A 2 - 2A + 1 
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Therefore 5(A) is an intersection of half-spaces and hence convex. Similarly we can write the set 5i(A) as all the 
(A, B) such that 

2B- 1 

A < 
A < 



2A 2 - 2A + 1 
2B- 1 



1- A 



and 5 2 (A) as all the (A, B) such that 



A > 2B - 1 
2B- 1 



A> 



1- A 

Therefore 5i(A) and 5 2 (A) are also the intersection of half-spaces and hence convex. 

From the construction of 5(A), 5i(A), and 5 2 (A) and the fact that J2 It is non-negative we can rewrite (12 1 1 as 

AA 2 -AA-A+B 

m S!(A) + 1 _ 2A ls 2 (A) 

f AA 2 - AA-A + B AS' 2 - AS' - 5' + B ) 
+ max |B1 s( a), Y^2A l5(A) ' 1^25' ls(A) J " ( } 

To further simplify (I22> . let us examine the max term. The regions where any two of the terms in the max intersect 
is defined by the following linear equations: 

25- 1 

= T - a" (23a) 

A = 2B - 1 (23b) 

a= 2aStT- (23c) 

Since all three terms in the maximization are continuous on 5(A), the maximizing term can only be exchanged on 
one of the lines defined in d23i . Therefore to find the maximizing term in 5(A) we only need to test one point in 
each region of the partition of 5(A) defined by (I23> . Since equations $23b ) and (123b ) are on the boundary of the 
set 5(A), we only need to consider the line defined by (123b ). It is easy to evaluate the expressions for two points 
and see that the maximizing term on 5(A) is 

AS' 2 ~A5'-S' + B 
1-2(5' ' 

We can therefore rewrite i2l\ as 

AA 2 - AA-A + B AS' 2 -AS' -5' + B 
Bl Sli A) + ^2A l52(A) + T^2S' ls(A) ' (24) 

This leads to the following expression for d20l : 

f AA 2 - AA-A + B AS' 2 — AS' - 8' + B AS" 2 - AS" - 5" + B \ 
maxj Bl Sl(A) + _ 2A 1 S2(A) + — 1 S(A) , _ — V e (o,A)j • 

(25) 
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Now define S"{A) = {(A, B) : S"{A,B) E (0, A)}, the set of (A, B) corresponding to feasible values of S" 
and rewrite d25l > as 

AA 2 -AA-A+B AS' 2 - AS' - 5' + B \ 
Bl Sl (A) + 1 _ 2A ls 2 (A) + 1 _ 2gl ls(A) I ls"(A)- + 

AA 2 - AA-A + B AS' 2 — AS' — 5' + B 

AS" 2 - AS" -S" + B 



max <j ( Bl Sl{A) + — — ls 2(A) + — — 1 S(A) ) 1 S » (A ), (26) 



i~W' ls "( A) 



Similarly to what we did for (122b . since the two terms in the maximization are continuous on S" (A), we can look 

at the region where the two terms intersect. Note that the continuity of both terms in (A, B) follows from that of 

5' and 5". On S"(A) the two terms intersect only on the lines defined by (123b ) and <|23h ). Similarly in the case of 

5(A), the lines (123b ) and 1)23 b) lie on the boundary of S"(A). Therefore, continuity implies that the maximizing 

term is not exchanged in S"(A). By picking any point in S"(A) we find that 

AA 2 -AA-A + B AS' 2 - AS' - 5' + B 
Bl Sl (A) + 1 _ 2A 1 S2 (A) + 1 _ 2§l ls(A) 



is the maximizing term. Combining this observation and (126b we can rewrite J20b as 

AA 2 — AA - A + B AS' 2 - AS' - S' + B , 
Bl Sl (A) + ls 2( A) + ls(A). (27) 

Observe that the first two terms in i27\ are linear in (A, B) and hence convex. 
Claim 1: 



(28) 



AS' 2 -AS'-S' + B 
1-26' 

is convex on 5(A) for all A G (0, 1/2). 

Proof: The eigenvalues of the Hessian of (12 8 1 on the set 5(A) are 

I -A(A + 2- AB)\/AA 2 + 1 + 4B 2 - 4B ) 
i ' 2A{A + 2- 4B) 2 J ' 

It is easily shown that the non-trivial eigenvalue has no real roots and is therefore non-zero for all (A, B) E 5(A). 
Hence, by continuity, to determine whether the Hessian is positive semidefinite on 5(A) we only have to evaluate 
it at a single point. Picking any (A, B) E 5(A), we find that the Hessian is positive semidefinite and, therefore, 
d28l is convex on 5(A). □ 
Therefore, all three terms in 1271 are convex on their respective sets. As shown during their construction, the sets 
5(A), 5i(A), and 52(A) are convex sets. Hence the alternative expression J27> tells us that minimizing J20l > is 
equivalent to finding the minimum of three separate convex functions each over a disjoint convex set. This can easily 
be done by minimizing each term in ( 120b over its appropriate partition, which can be carried out using efficient 
convex optimization algorithms. In particular, we have implemented the denoiser using the log barrier method with 
gradient descent (see [1] for a detailed discussion of the log barrier method). After the three minimizations are 
carried out it remains only to select the minimum between the three values. Overall, this gives an efficient and 
concrete way of calculating fuu k [Pz k > A]. Algorithmic complexity will be discussed in Section M 
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IV. Largest Crossover Probability Consistent with Observations 

Recall the definitions of T(Pz) and Ti(P z i) (equations © and J14l ~>. We shall also use T/(Pz) to denote T/ 
evaluated for the l-th order marginal of Pz- We shall say that 6 < 1/2 is ^-feasible if there exists a distribution on 
a noise-free tuple P x i that gives rise to P z i when corrupted by a BSC(<5). Thus, T; is the maximum ^-feasible S. 
It is easy to show that for any stationary ergodic Pz, Ti(Pz) is non-increasing in I and that 

T(Pz) = lim r,(P z ) = inf r,(P z ). (29) 

/ — >oc I 

In our setting Pz is unknown and, consequently, so is T(Pz). In this section we develop an efficient algorithm to 
estimate T(Pz) as a function of the empirical distribution Q l [z n ]. Note that this estimate, Ti(Q l [z n ]), or Ti(z n ) for 
short, is the one implicitly employed also by our denoiser in (I16> . since the estimate of the channel uncertainty set 
that it employs A; is taken as the intersection between the a priori channel uncertainty set [0,U] and the estimated 
feasible set [0,fj. 

In [4] it is shown that under mild conditions there exists an unbounded sequence l n such that 

lim f i n {Z n ) = T{Pz) Pz - a.s. (30) 

n — >oo 

In [10, Section 8-C] a method for obtaining an upper bound on Ti(Pz) is suggested. Let {Cj}j' =1 denote all the 
2 l binary sequences. The idea is to look at the miriij ipf^ where, 

<p™ ± mm {P (z = \\Zz\ = cf , Z[ = cf) ,p(z Q = 0\Zzl = C® , Z[ = cf) } \fi,j, (31) 

(21) 

the conditional probability on the right side being the empirical one induced by the data. It is clear that min, j tp\ J - 

(for n large enough so that the empirical distribution is close enough to the true one) is an upper bound to IY The 

following is a numerical example illustrating that the method of [10] yields, in general, upper bounds that are not 

tight, while T; is guaranteed by (I30i to converge to the true value. 

Example 1: Let Pz be the first order symmetric Markov Process with transition matrix 

/ 0.695 0.305 \ 
Y_ 0.305 0.695 J 

passed through a BSC(.l). For this case clearly T(Pz) > 0.1 and, in fact, the inequality can be shown to 
be strict [7]. So, in particular, Ti(Pz) > 0.1 for all /. Simulations yield (with high precision and confidence) 

miiijj- ip?' = 0.2629, mirijj — 0.2440, mini ,j fj — 0.2415, and miriij ip"p — .2398. On the same simulated 
data one finds f 2 (z") = .1757, f 4 (z n ) = .1451, f e (z n ) = .1266, and f 8 (z n ) = .1107, 

A. Efficient Computation ofTi(z n ) 

Our estimate of T; simply evaluates at the acquired empirical distribution. As we now show, this is a simple 
calculation. Given a stationary Pz, S € (0, 1/2) and I, if S is Z-feasible there exists a corresponding stationary input 
process X which when passed through a BSC (5) yields P z i. Therefore, letting {C^}^' =1 denote the set of binary 
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/-tuples, we can define and rewrite 



/?f 4 P(Z 1 = l|ZVi=Cf ) 



(32) 



p(zVi = cf ) 

where * denotes binary convolution (denned as p * q = (1 — p)g + p(l — g)), II5 is the channel matrix associated 
with the BSC(<5), Ylg(x,z) is the channel transition probability, and 



aV±p(x 1 = l\X°_ l+1 =C?>) Vj. 



For simplicity define, 



A p ( v° 



and 



ef ^ p (z°_ l+1 = cf 



and let 7 and 6 be the associated length 2 l column vectors. Then, 

e T = 7 T nf 1 

where 11® denotes the I th tensor product of the matrix Us. Since 6 < 1/2, IT^ 1 exists and we have 



7 = ^6. 



Using the above notation, we can write 



where 



e,- 



* = nf . 



We can simplify the summation using vector notation. Dropping subscripts to indicate the corresponding vector 
gives 

where denotes component-wise multiplication and is the j th column vector of <£. We can simplify further to 
obtain 

/?(') © e = * T [((n 5 - T )®'e) (a« * S) 

which, following standard algebraic manipulations gives 



(njT(/5 (!) 9) [(n^ T ) 0i e] , 



where denotes component-wise division. We will also use the following easily verified identity 

1 - 26 ' 



(33) 



(34) 
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where 1 denotes the "all ones" column vector of appropriate dimension (in this case 2 l ). Combining (I33> and i34l 
gives an explicit expression for a™ in terms of 

With this notation in mind, Ti is nothing but the maximum value of S G [0, 1/2] such that all the component of 
are in [0, 1]. Hence a given S is /-feasible if and only if all the components of the associated are in [0, 1]. 
With each iteration of this /-feasibility test, we can shrink the uncertainty in Ti by a factor of 1/2. We can therefore 
quickly converge to the true Ti as well as give precision bounds for a fixed number of iterations. In particular, 
neglecting empirical noise, after T iterations of the /-feasibility test, we know our estimate is within ±2~ T ~ 1 of 

r,. 

V. ALGORITHM 

Having discussed various aspects of computing the universal minimax denoiser in Sections IIIIICI and IIVIAI in 
this section we present the pseudo-code for the universal denoiser of Section I111IAI and discuss the complexity of 
the algorithm. As before, the size of the data is denoted by n and the alphabet by A. Algorithm: Minimax Denoiser 

1 initialize Fix context length k, channel estimation parameter /, and channel estimation tolerance T 

2 Construct empirical distributions, P Z k and P z i 

3 Su =U and Su = 

4 for i = 1 to T 

5 if (5u + S L )/2 is /-feasible 

6 S L = (Su+S L )/2 

7 elseif (Su + 8l)/2 is not /-feasible 

8 Su = (S v + S L )/2 

9 end 

10 end 

11 A* = (Su + S L )/2 

12 TOx = min /eSi(Ai) B 

13 fi = argmin /eSi(Ai) B 

14 to 2 = mm /e&(Ai) (AAf - AA t - A t + B)/(l - 2A ; ) 

15 h = argmin /eS2(Ai) (AA2 - AA t - A; + B)/(l - 2A t ) 

16 to 3 = min /eS(Ai) (A5' 2 - AS' - 5' + B)/(l - 28') 

17 h = argmin /eS(Ai) (A5' 2 - AS' - 8' + B)/(l - 28') 

18 X n ' k ' l (z n ) — fi where m, = min{mi, TO2, TO3} 

19 Denoise using X n - k - l (z n ) 

where /-feasible is defined in Section ITVl A, B, and 5' are functions of P Z k and / defined in Section UlIICI and 
the minimizations are accomplished using convex optimization. 

The complexity of the algorithm can be broken down as follows, 

1) Collecting context data is 0(n) and requires |.4|' +1 and |„4| 2fe+1 memory cells 
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2) Constructing empirical distributions, P Z k and P z a, requires 0(|„4| 2fc ) and 0(|^4.r) operations, respectively 

— k I 

3) Calculating A; estimate requires 0(|„4| 2i ) operations 

4) Complexity of the convex optimization steps is roughly 0((|„4| 2fc+1 ) 3 ) = 0(|yl| 6fc ) 

5) Performing the actual denoising is 0(n) 

From Corollary 6 in [4], we note that k and I should not increase faster than log(n)/(16 log |*4|). This implies that 
the algorithm requires memory of order 0(n) and that the complexity of the convex optimization and ^-feasibility 
test is 0(n). In other words, the total complexity of algorithm is 0(n) and it requires 0(n) memory cells. From 
this we see that the algorithm scales gracefully in n. For a detailed discussion of the complexity of the convex 
optimization algorithm, see [1]. 

In comparison, the DUDE algorithm in [10] requires 0(n) computations, and memory of order 0(n). It should 
be noted that the DUDE does not require any convex optimization. 

VI. Simulations and Experimentation 

In this section, we present experimental results obtained by implementation and employment of the scheme of 
Section UlIIAI for the case of a binary signal corrupted by a BSC with an unknown crossover probability. We shall 
refer to the minimax scheme in this case as the Minimax Binary Denoiser (MBD), which we implement using the 
methods presented earlier. We compare the performance of the MBD to that of DUDE from [10] on simulated 
sequences (one dimensional), simulated fields (two dimensional), and on two real world images. For comparison, 
the DUDE of [10] is employed with the channel estimate suggested in [10], the one developed in Section llVl and 
the true channel parameter. Throughout the simulations, the channel crossover probability is, a priori, only known 
to lie in [0, 1/2). 

A. A Modified DUDE 

In [10, Section 8-C] the problem of denoising an unknown source, corrupted by an unknown discrete memoryless 
channel is considered. The algorithm suggested is to estimate the channel parameters and then to apply the DUDE 
assuming the channel estimate in lieu of the unknown channel parameters. The channel estimate suggested (as 
mentioned in Section ITVt is, 

Si = minmin {^], 1 - ff} j • (35) 

We refer to the application of the DUDE using a channel estimate as a Modified DUDE algorithm (M-DUDE). We 
propose an improvement to this algorithm: Rather than using the estimate in (1351 . which in general, as argued in 
Section IIVI loosely upper bounds the largest feasible channel crossover probability, we suggest using the estimate 
Ti in (I29> . which, by i30\ . converges to the true upper bound. 

B. One Dimensional Simulations 

We implemented the MBD as discussed in Section ITTT1 for ID sequences going through a BSC. As a source for 
our simulation we chose a hidden Markov source. To generate the source, a first-order symmetric binary Markov 
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Markov 
Sequence 



BSC(<5 S ) 



X 



BSC(<5) 



Source 

Fig. 1. The box diagram shows the process of creating the source, X, and the output, Z after going through a BSC with transition probability 
S. 



sequence with transition probability p was simulated and then sent through a simulated BSC with parameter S s . 
The hidden Markov source was then corrupted by a simulated BSC with parameter S, see Figure 



We then compare the performance of the MBD to that of the M-DUDE. Since the MBD is a randomized denoiser, 
the actual denoising is performed by drawing a random variable, independently for each symbol, according to the 
distribution given by the MBD for the observed context. The output of the denoiser is then compared to the clean 
source, and the error is calculated. We apply the M-DUDE with two channel parameters: the estimate in d35i . 
as suggested in [10], and our estimate from Section ffVl iy To get an idea of the optimum channel-dependent 
performance we also ran DUDE using the true value of the channel crossover probability <5. Tables U and [H] show 
the results of the simulations. 





Estimate of <5 


Denoising Performance 


Ss 


S 


r 4 


Si 


M-DUDE(r 4 ) 


M-DUDE(<5 4 ) 


DUDE 


MBD 





0.1000 


0.1016 


0.1320 


0.0773 


0.0773 


0.0773 


0.0775 


0.0100 


0.0918 


0.1015 


0.1328 


0.0798 


0.0798 


0.0761 


0.0776 


0.0200 


0.0833 


0.0979 


0.1319 


0.0741 


0.0820 


0.0741 


0.0741 


0.0300 


0.0745 


0.1058 


0.1336 


0.0838 


0.0838 


0.0724 


0.0771 


0.0400 


0.0652 


0.1037 


0.1333 


0.0856 


0.0856 


0.0651 


0.0762 


0.0500 


0.0556 


0.1030 


0.1333 


0.0872 


0.0872 


0.0559 


0.0754 


0.0600 


0.0455 


0.1011 


0.1335 


0.0889 


0.0889 


0.0451 


0.0745 


0.0700 


0.0349 


0.0999 


0.1314 


0.0682 


0.0902 


0.0349 


0.0682 


0.0800 


0.0238 


0.1045 


0.1320 


0.0913 


0.0913 


0.0238 


0.0711 


0.0900 


0.0122 


0.1039 


0.1317 


0.0919 


0.0919 


0.0120 


0.0690 


0.1000 





0.1078 


0.1327 


0.0927 


0.0927 





0.0674 



TABLE I 



Denoising using a hidden Markov source where the Markov source has transition probability p = 0.15. Here 6 3 * 5 is 

FIXED AT 0.1, k = 2 AND SAMPLE SIZE IS 10 6 . 



We note that when both the source is unknown, and there is channel uncertainty, there is a risk of injecting 
noise as the simulation results show. Where this is the case we notice that the MBD injects substantially fewer 
errors than the M-DUDE. This is a consequence of the worst case criterion the MBD was designed to optimize, 
which leads to more conservative denoising. We also note that in a number of cases the performance of the MBD 
is actually comparable to that of the channel-dependent DUDE (shown in [10] to achieve optimum source- and 
channel-dependent performance). 

Finally, we note that the M-DUDE(T4) performs consistently better than M-DUDE(<54), often performing compa- 
rably to the channel-dependent DUDE. This is due to the fact that T is a better (asymptotically consistent) estimate 
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Estimate of 8 


Denoising Performance 


S s 


8 


T4 


84 


M-DUDh(l 4) 


TV ft T"\T 1 T "\ I " / r \ 

M-DUDh(04) 


DUDE 


MBD 





0.0500 


0.0560 


0.080 / 


0.0430 


0.0471 


0.0430 


0.0430 


0.0050 


0.0455 


0.0567 


0.080/ 


0.0425 


0.0485 


0.0425 


0.0425 


0.0100 


0.0408 


0.0618 


0.0803 


0.0505 


0.0505 


0.0412 


0.0434 


0.0150 


0.0361 


0.0617 


0.0800 


0.0513 


0.0513 


0.0359 


0.0426 


U.UZUU 


n m 1 1 
U.UJ 1 j 


U.UozU 


U.UoUU 






U.UJ13 


U.U4Z4 


0.0250 


0.0263 


0.0580 


0.0804 


0.0404 


0.0545 


0.0262 


0.0404 


0.0300 


0.0213 


0.0587 


0.0804 


0.0398 


0.0562 


0.0213 


0.0398 


0.0350 


0.0161 


0.0518 


0.0807 


0.0390 


0.0573 


0.0160 


0.0390 


0.0400 


0.0109 


0.0569 


0.0798 


0.0380 


0.0582 


0.0109 


0.0380 


0.0450 


0.0055 


0.0560 


0.0803 


0.0372 


0.0598 


0.0054 


0.0372 


0.0500 





0.0591 


0.0810 


0.0364 


0.0612 





0.0364 



TABLE II 

Denoising using a hidden Markov source where the Markov source has transition probability p = 0.15. Here 8 s * 8 is 

FIXED AT 0.05, k = 2 AND SAMPLE SIZE IS 10 6 . 



of the largest feasible i5, while 8 (even asymptotically) is in most cases a strict upper bound to it. 
C. Two Dimensional Data 

We begin with simulations similar to those of the previous section, except now on a two-dimensionally-indexed 
simulated process (image) as the source. We implemented the MBD for 2D sources as in [6], and compared it 
to a 2D implementation of the DUDE for a BSC as in [11]. As before, we ran the M-DUDE with the estimates 
suggested in [10] found in J35l >. and our estimate from section ITVl T/. As in the previous section, for the source- 
and channel-dependent optimum performance benchmark we also ran DUDE using the true 5. 

Here the context used for denoising consists of the 3x3 square centered on the symbol to be denoised. 
Furthermore, for calculating and ^3, we use the 3 bits in the upper left-hand corner of the 3x3 square. 

For our experimentation we use a hidden random field (HRF). First we generate a binary random field causally 
by letting each pixel component depend stochastically on the pixel to its left and the pixel above it, where the left 
and top boundary are drawn according to a Bernoulli distribution with parameter 1/2. More specifically, 

Nlf if i = or j = 

N it j if x i-i,j x ij-i (36) 
N™j if Xi-i,j = Xij-i = 
Nfy if Xi-i,j = Xi,j-i = 1, 

1/2 ~ 

where x^j denotes the component at location and {N t j }, {N^} and {N?A are independent fields, consisting 
of independent components which are Bernoulli with parameters 1/2, a, and a = 1 — a, respectively. We then 
corrupt this field by a BSC with transition probability S s and the output is then used as the noiseless image for the 
simulation, i.e., analogously as in Figure replacing the Markov sequence by a random field. 

The hidden random field is then sent through a BSC with transition probability 6 and denoising is performed. 
We used a test image of size 2000 x 2000. Tables fTTTI and II VI show the bit error rate of the denoised image relative 
to the noiseless one. 

The results show a trend similar to that observed for one-dimensional signals. We notice that the MBD consistently 
outperforms the M-DUDE(<5 3 ) and is comparable in performance to the DUDE with the true channel parameter 
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Estimate of 6 


Denoising Performance 




<5 


r 3 


<5 3 


M-DUDE(r 3 ) 


M-DUDE(<5 3 ) 


DUDE 


MBD 





0.01 


0.0184 


0.0606 


0.0115 


0.0182 


0.0095 


0.0114 





0.02 


0.0279 


0.0717 


0.0169 


0.0225 


0.0169 


0.0169 





0.05 


0.0575 


0.105 


0.0354 


0.0457 


0.0354 


0.0354 





0.10 


0.106 


0.160 


0.0682 


0.0696 


0.0687 


0.0691 



TABLE III 

Denoising results for a hidden random field with a = 0.05, image size 2000 x 2000. 





Estimate of <5 


Denoising Performance 




5 


r 3 


Ss 


M-DUDE(r 3 ) 


M-DUDE(<5 3 ) 


DUDE 


MBD 


0.0500 


0.0556 


0.1019 


0.1187 


0.0619 


0.0677 


0.0556 


0.0550 


0.0750 


0.0294 


0.101 


0.119 


0.0757 


0.0836 


0.0295 


0.0542 


0.0250 


0.0263 


0.0504 


0.0639 


0.0335 


0.0337 


0.0263 


0.0628 


0.0375 


0.0135 


0.0499 


0.0640 


0.0416 


0.0428 


0.0137 


0.0268 


0.0150 


0.0052 


0.0207 


0.0313 


0.0156 


0.0163 


0.0051 


0.0112 


0.010 


0.0102 


0.0202 


0.0314 


0.0122 


0.0135 


0.0102 


0.0122 



TABLE IV 

Denoising results for hidden random field with a = 0.01, image size 2000 x 2000. 



5, (shown to be optimal in [10]). Again, the M-DUDE(f3) performs better than M-DUDE((5 3 ), and in fact does 
essentially as well as the MBD. As before, this is due to the fact that when the underlying data contains strong 
structure such as in the case of random fields, then T; tends to be close to the true channel parameter 8. This fact 
suggests that the M-DUDE(Ti) would be a good algorithm for denoising natural images, as is indeed observed in 
the examples that follow. 

We next present denoising results for a binary text image. We scanned half a page of text at a resolution of 
1000 x 600. In Table IV! we show a piece (approx. 1/6) of the original, noisy, and denoised images for 5 = 0.1. 
For this particular case the DUDE using the true channel parameter 5 had a normalized error rate of 0.0330, while 
the MBD had an error rate of 0.0479. The M-DUDE(<5 3 ) did better than the MBD, with error rate close to that 
of the DUDE with the true channel parameter. This should be contrasted with Tables [ill] and IIVI where the MBD 
consistently did better than the M-DUDE(<5 3 ). The high performance of the M-DUDE(<5 3 ) is explained by the fact 
that we were denoising a text image. A text image is composed of mostly white background which allows for any 
reasonable channel estimate to be highly accurate. Hence in this case the M-DUDE(<53) is effectively implementing 
the DUDE with the true channel parameter, the optimal denoiser. On the other hand, the MBD takes a more 
conservative approach believing that the true channel crossover probability could be anywhere between and I3. 

Our final result is the denoising of a binary image. We used a halftoned image of size 1200 x 1785. In Table IVTl 
we show the original, noisy, and denoised images at approximately 1/6 scale. In this case, the channel parameter 
8 is set to 0.02. As before, we implemented the DUDE using three channel parameters, the true channel parameter 
5, 63, and 13. We also denoised using the MBD. The normalized errors of the denoisers were, 0.019 for the DUDE 
with the true channel parameter, 0.0327 for the M-DUDE(<5 3 ), 0.0289 for the M-DUDE(f 3 ), and 0.019 for the 
MBD. Notice that both the M-DUDE(<5 3 ) and M-DUDE(f 3 ) inject errors. This is not the case for the MBD which 
reduces the amount of errors. Hence, unlike the text images of Table [V] the MBD outperforms the DUDE for 
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In a recent work [1], the authors introduced a discrete 
universal denoiser (DUDE) for recovering a signal with 
finite-valued components corrupted by finite-valued, un- 
correlated noise. The DUDE is asymptotically optimal and 

16 a recent work jlj, the mlwrs introduced a discrete 
taifstaMb J&iaissf IJUJDE)- Sci- T^eo^tafe ; a : ;S^^Kl ■ 
SsMte-vafaed -cara^sniferts corrupted ty iloile-Valijed, ufe- 
bitT^a|ey-j3»^c ! "Tiie asyjs|? iptkalf -|fjfiniii"-4ii<i- 

lu a rsceof work [ 1 j. the aurhors introduced a discrete 
universal drnoiittt (DUDE) tor rcco\ ering a signal ivitfi 
fijite-Woed component* corrupted by finite -valued, un- 
com-iated noic:. I lie DUDE is asymptotically optimal and 

■is s-raeeof- ymfe .[i£...tfee authors "tfieeduced- a. iftserefe 
universal iwo&m- IJDUDEJ- |«-rf©!yfei!ag "a signal with" 
fiyfc-vaifeed jtomporieats corrupted: by fwdte-v&Sued^ ufr 
is»elatedjH)isft. T&e DUDE is asyjEfptotifiaily optimal ihd - 

TABLE V 

Denoising of a text image. The top image is the original image, next is the NOISY VERSION WHERE (5 = 0.1, THEN A DENOISED 

VERSION USING THE DUDE WITH THE TRUE <5 AND FINALLY A DENOISED VERSION USING MBD. 



both channel estimates. The reason for this difference is due to the fact that with images incorporating fine detail, 
it is hard if not impossible to estimate the channel parameter. Stated otherwise, the original image is inherently 
noisy. Therefore denoisers that are optimized for a particular channel, such as the DUDE, may perform poorly 
and, in some cases, inject errors. Even with this difficulty to estimate the channel parameter, we observe that the 
M-DUDE(r3) outperformed the M-DUDE(<53). This is because, as shown in previous results, T; tends to be closer 
to the true maximally consistent channel parameter than <5/. In particular, for this case, T3 was 0.0524 and ^3 was 
0.0683. Also observe that, similarly to the hidden random field results of Tables ITTTI and II VI the MBD does as well 
as the DUDE with the true channel parameter. 

VII. Beyond Binary Alphabets 

Up to this point we have confined attention to binary alphabets. Our goal in this section is to extend the algorithms 
developed in Sections IlIIICI and II VIAI to the more general non-binary setting. In particular, we will address the case 
of non-binary alphabets with channels parameterized by a single parameter. In other words, the uncertainty in the 
knowledge of the channel can be described by the uncertainty of a single parameter, analogously as in the binary 
case where the uncertainty was in the the crossover probability of the BSC (knowing that the crossover probability 
is less than 1/2). We also require some structure in this free parameter, 8. We require that the channels be monotonic 
in 6, i.e., that if So is feasible for a given Pz then all < 5 < 5q are also feasible. As before, we say that S is 
feasible if there exists a source such that, when passed through the channel defined by S, the output statistics agree 
with Pz- 

Observe that with the above constraint, our definition of T(Pz), extended from the binary case in 0, becomes 

r(P z ) = max {S : 3P X s.t. P x * Ch{5) = Pz} , (37) 
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(D) (E) (F) 

TABLE VI 



Denoising of a halftone binary image. Image (a) is the original image, (b) is the noisy version where <5 = 0.02, (c) is the 

IMAGE DENOISED USING THE DUDE WITH TRUE CHANNEL PARAMETER, WITH NORMALIZED ERROR OF = 0.019, (D) IS THE IMAGE 
DENOISED USING THE DUDE WITH CHANNEL ESTIMATE 83, WITH NORMALIZED ERROR OF 0.0327, (E) IS THE IMAGE DENOISED USING 
THE DUDE WITH CHANNEL ESTIMATE f 3, WITH NORMALIZED ERROR OF 0.0289, AND (F) IS THE IMAGE DENOISED USING THE MBD, 

WITH NORMALIZED ERROR OF 0.019. 

where Px * Ch(S) denotes the distribution of the output process of the channel with the parameter 5 whose input 
process has distribution Px- We define for this setting a minimax performance benchmark, similarly to (|9j, 

H{P Z , Z) = lim lim sup min rff> (P z , Z) , (38) 

fe^oo „^oo fer k 1 

In) 

where here Cj is the loss of the sliding window scheme /, extended from the binary setting. 

By confining our channel uncertainty into a single parameter, we can preserve much of the structure from the 
binary alphabet case. With this structure intact we can adapt the algorithms from Sections IIIIICI and II VIAI to the 
single parameterized non-binary alphabet case. This adaptation needs to be done on a case by case basis. In the 
following section we illustrate how this is done for a particular family of non-binary channels. We denote the 
alphabet by A — {1, ... , M}, i.e. M is the alphabet size. The family of channels we will use for our example can 
be considered the M-ary generalization of the BSC. 
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A. The Symmetric Channel 



Let the probability transition matrix of the channel be given by 

/ 1- 



M-l 



M 



'I \ 



M-l 



V 



A/-1 



■•• 1-6 

M-l M-l ' 

It is readily verified that if a single variable X E A is distributed as 

/ {M-l) ai -S \ 

' (M-l)-MS ' 



P 



(M-l)q M -i-^ 
(M-l)-Md 
(M-l)-5-(M-l) SZflY 1 Oj 
\ (M-l)-Md * 

is corrupted by the above channel then the channel output distribution is 

M-l 



Pz = (ctl, ■ • ■ , OCM-1, 1 ~ 2J 



(39) 



We represent a denoiser for this single-observation problem as {g^j}*, = i, with djj denoting the probability that 



the denoiser outputs a reconstruction j upon observing i. In particular dij G [0, 1] and Y^,jLi dij = 1 Vt. The 
expected loss of this denoiser (in the single observation problem) is 

F({a l }?i 1 ,5,{d h] }l< =1 ) = 



(M — l)^ - S 
(M -I) -MS 

(M — l)a 2 - 8 
(M-l) - MS 



+ 



M 



7 E d ^ 



(M-l)-*- (M-l) E^ 1 ** 



(M - 1) - M<5 



M 



1 E 



M-l 

E 



(M - l)a* - 5 



(i-^)E^. 



^ (M - 1) - M<5 

( M-i)-5-(M-i)E-=r^ 



M 



7 E *j 



(M - 1) - MS 



M-l 

l _ (1 x\ \ ^. . . 

3 ' M — 



j^M 



1 E ^ 

i/M.j^M 



(40) 



where a, is defined in ( I39l l. 

Consider now a probability distribution on M-ary (2fc + l)-tuples, Pz k k - Consider a denoiser for one symbol 
based on observing a noise-corrupted (2k + l)-tuple around it / : A 2k+1 — > <S(.4). Such a denoiser can be thought 
of as a collection of single-symbol denoisers, one for each context, where if the current context is z~jr, 2* and we 
observe i as the middle symbol, the denoiser will change the symbol to j with probability i, ^i])[j]. We 
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define the following functional 



zZl,z1e{1.2,...,M} k 

(41) 



where P Zq \ z -^ z k denotes Pr(Zo = \\Z_^ = z_^, Zf = zf) under the source P Z k . We define 

fMM k Pz k ,iA = argmin max G k [P Z h ,S,f) , (42) 

L - fc J f 0<5<A \ - fc / 

selecting an arbitrary achiever when it is not unique. Let X 7hk ' 1 denote the n-block denoiser defined by, 

*["]'*'' (* n ) =/ MMfc (g 2fc+1 [2"],A ; (z")) k + l<t<n-k, (43) 

B. Algorithms for the Symmetric Channel 

Now that we have examined the general symmetric channel, we want to extend the algorithms from sections UlIICI 
and I1VIAI to this more general case. We observe first that the single parameter structure discussed earlier exists 
in the case of the symmetric channel. Combined with the invertibility of the channel matrix, this single parameter 
structure is all we need to adapt the algorithm in section lTVIAI to the generalized symmetric channel case. The same 
algebraic construction found in section ITVIAI follows with the modification that 



0fl±P(Z 1 =i\Z°_ l+1 = cf ) ) (44) 



af}±P(X 1 =i\X°_ l+1 =cV). (45) 

Hence the channel described by a particular S is /-feasible if and only if each row of the associated matrix is 
an element of the M-dimensional simplex. Therefore we have an algorithm that not only converges to A/, but also 
produces bounds with each iteration. 

Now that we have extended the algorithm from section ITVIAI we turn our attention to solving J42i . As in ill 2b 
we define J k by 

J k (P zk A, /) = max G k (p z , S,f). (46) 

\ / 0<<5<A \ — fc / 

Similarly to the analysis in Section I1I1ICI from d40l and J41i it follows that 

AS 2 + BS + C 



G k (P z * h ,5,f 



M — 1 — MS 

where A(P Z h ,/), B(P Z h ,/), and C(P Z k , f) are affine functions and can be derived as in the binary case of 

Section UlIICI Therefore J k can be expressed as 

( ri AA 2 +BA + C AS' 2 +B5' + AS" 2 + BS" + C t 1 

maX \ C > M-l-MA ' M-l-MS' l5 ' e(0 ' A) ' M-l-MS" ^"^j ' (47) 

where 



AM - A + JA(AM 2 - 2AM + A + BM 2 - BM - CM 2 ) 

= 1 ; and 

AM 

AM -A - ^/A(AM 2 - 2 AM + A + BM 2 - BM - CM 2 ) 

AM ' 
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Hence, as in Section lTlIICI Jk simplifies to the max between four points which are simple functions of the coefficients 
A, B and C. Hence, for a given denoiser /, the quantity Jk(f) is easily calculated. 

The analysis for the minimization of Jk for the simple binary case in section I111ICI is quite involved. Certain 
subtleties in the analysis suggest that the general M-array symmetric channel may not be piecewise convex or that 
finding the boundaries of the convex regions may be overly complicated. Hence this analysis needs to be carried 
out and verified for the particular alphabet size at hand. 

Since we cannot simply extend the algorithm from Section I111ICI to the general M-array symmetric channel case 
without analysis for each M of interest, is there something that can be done in general? Earlier it was shown 
that for a given denoiser / we can easily calculate Jk(f)- This suggests that although it might not be possible 
to find the absolute minimum of ■/&, we can apply methods such as simulated annealing to estimate the absolute 
minimum, see [8] and [9] for a detailed discussion of simulated annealing over a continuous domain. Both the 
simulated annealing methods discussed in [8] and [9] are concerned with unconstrained minimization. Since we are 
dealing with the constrained minimization of Jk , we once again will need to make use of the log barrier method as 
described in [1]. The benefit of using simulated annealing is that one can estimate the minimax optimal denoiser 
and have some control over the complexity versus accuracy of the estimation. The control over the trade off comes 
from controlling the annealing schedule. 

We have therefore managed to extend the algorithm from Section ITVIAI to the general Af-ary symmetric channel 
case. We have also shown that for a particular M, it is possible to extend the algorithm from Section lTlIICI to the 
general M-ary symmetric channel case, and that even if one cannot use the convex optimization methods developed 
in Section IIIIICI estimates of the minimax optimal denoiser can still be obtained using simulated annealing. Hence, 
in practice, for any M, one can apply the minimax denoiser for the Af-ary symmetric channel. 

C. Beyond Symmetric Channels 

The analysis and methods used in Sections I V11IAI and IV11IBI can be extended to many other families of channels. 
With use of simulated annealing, see [8] and [9], the above methods can be applied to any family of channels 
with the proper scalar parameterization and with easily calculated expressions for Jk- This significantly extends the 
possible applications of the minimax denoiser to the non-binary case. 

VIII. Minimax ^ Maximin 

A natural question arising in the context of our minimax criterion is whether it coincides with the maximin. 
An affirmative answer would imply that a minimax optimal scheme is an optimal scheme for the least denoisable 
source-channel pair consistent with the output distribution. This, in turn, would suggest that employing the DUDE 
of [10] tailored for the least denoisable source-channel pair, which can easily be estimated, would give rise to a 
universally minimax optimal and practical scheme. 

Unfortunately, as we now show, the minimax does not coincide with the maximin in our problem. Specifically, 
we shall argue that for some noise-corrupted sources the minimax is greater than the maximin. To show this we 
assume the input process, Px is Bernoulli p, p < 1/2, and is corrupted by a BSC (5) channel. We assume that the 
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channel crossover probability, 8, belongs to an uncertainty set U, which is given to us. In particular, assume that 
U = {0, 0.01, 0.02, . . . , 0.49}. As a performance measure we take the unconditional expected loss 



min max E lPx s] [L f (X n , Z n )]. 



(48) 



Corollary 2 of [4] ensures that the minimax denoiser, as defined in Section IIIIIAI attains the minimum in (I48> and 
hence justifies the use of the unconditional measure. 

With this setup, the output process, Pz, will be Bernoulli with parameter p * 5, where 5 € U, and we can assume 
it is known to us. In such a case, there are only two optimal schemes, as well as mixtures of the two. Depending 
on the channel parameter 5, the optimal scheme either outputs what it sees (<5 < p) or outputs all zeros (S > p). 
The transitional point is when 5 = p. 

We now consider the (5, p) pair such that 6 = p and p * 5 = Pr{Z = 1}; Denote the value of 5 and p that 
satisfy this by 5* . For this case it can be shown that the scheme attaining the minimum in (I48> assigns a unique 
probability a E (0, 1) to the "say what you see" scheme and a to the "say all zeros" scheme. In other words it 
is a mixture of the two optimal schemes. In order for the minimax to be equivalent to the maximin, the optimal 
minimax denoiser, in the sense of (I48> . would have to be an optimal denoiser for the worst possible channel, i.e. to 
swap the minimax for maximin, the channel maximizing the loss would also need to be a BSC with transition 
probability i5*. Figure |2]is a plot of S* and the 6 maximizing the loss with respect to Py{Z — 1}, denoted by S w . 
In other words, 5 W is the least denoisable channel which agrees with Pz- 




Fig. 2. Plot of <5* and 8 W with respect to the parameter of the Bernoulli noise-corrupted source. 



Evidently, it is not the case that S* — S w . Hence, with Figure [2] in mind, we can easily verify that for Pz equal 
to Bernoulli(0.18), 5* = 0.1, 

min max E iPx s] [L f (X n , Z n )] > max min E lPy . 5] [L f (X n , Z n )\ (49) 

/GM opt {{P x ,8):Seu,p x *s=p z } 1 x ' |L {(Px.,sy.6eu,Px*s=Pz} feMopt 1 x ' |L 

where M opt is the set of denoisers which are optimal for some (Pz, 6) pair, i.e., a denoiser is an element of M opt 

if there exists some pair (Pz, S) for which it is optimal. This observation leads us to the following: 
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Theorem 2 (Minimax ^ Maximin): There exist stationary ergodic sources Pz for which 

min max E lPx s] [L f (X n , Z n )] > max min £ [Px s] [Lt(X n , Z n )} (50) 

fer k {(p x ,6):5<=u,p x *6=p z } 1 x ' |L /v {{P^.syseu,p^*s=p z } s^ k 1 x ' |L /v 

Proof: Assume, by contradiction, that J50i does not hold when Pz is the Bernoulli (0.18) source. This implies that 
the minimizing denoiser is optimal for the worst possible channel. Hence to attain the minimax performance it is 
enough to minimize only over mixtures of optimal denoisers, i.e., 

min max E lPx s] [Lt (X n , Z n )\ = min max E [Pyi s \ [Lt (X n , Z n )\. (51) 

f&r k {(P x ,S):SeU,P x *S=P z } 11 f&M opt {(Px,<5):<5eW,Px*<5=Pz} 1 x ' 1 

Combining ( 15 1> with ( I49> gives 

min max £ [Px s] [L f (X n , Z n )\ > max min E ]Px ^[LtfX" , Z n )}. (52) 

fe^ k {(p x ,s):Seu,p x *s=p z } 1 x- |L {(Px,s):Seu,p x *s=p z } feM opt 1 x ' |L 

On the other hand clearly 

max min £ [Px s] \L f (X n , Z n )\ > max min s] [Lt (X n , Z n )} (53) 

{(Px,5):5GW,Px*<5=Pz}/eM opt lx ' |L {(P x ,S):5eU,P x *S=P z } feM k i x . J l 



which when combined with d52i implies that J50i holds, contradicting our assumption. 

□ 

As shown in Corollary 2 of [4], the minimax denoiser has the property of attaining the minimum in J48t . However, 
Theorem I3 implies that such a denoiser is not in general an optimal denoiser for the worst source-channel pair. 
Therefore, the minimax denoiser and an optimal denoiser for the worst source-channel pair are not in general the 
same. Accordingly, a denoiser designed to be optimal for the worse source-channel pair is not guaranteed to be 
minimax optimal. 

IX. Conclusions 

In [4], denoisers that are asymptotically optimal in a worst case sense are suggested for the setting of an unknown 
source corrupted by a DMC, under channel uncertainty. The present paper was dedicated to the implementation 
of these denoisers. We have presented efficient algorithms for implementing the denoisers suggested in [4] for the 
binary alphabet as well as for efficiently estimating the set of feasible channels in the uncertainty set. We also 
extended these algorithms to a large family of channels in the non-binary case, focusing on the generalized M-ary 
symmetric channel. It was shown that the suggested universally min-max denoisers do not correspond to schemes 
that attain the optimum distribution-dependent performance under the worst case source-channel pair. In general, 
the min-max denoisers are not optimal distribution-dependent schemes for any source-channel pair, implying that 
use of the DUDE of [10] with a channel estimate is suboptimal under the worst case loss criterion. We have also 
presented a natural modification to the original DUDE, M-DUDE(T;), which employs the DUDE using an estimate 
of channel parameter, which was shown to be consistent in the sense of converging to the largest feasible channel 
parameter. Simulations shown suggest that, in practice, this scheme may perform well in denoising images under 
channel uncertainty, often attaining performance which is comparable to that of the MBD. 
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